Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filtering to fieldcaps endpoint #83636

Merged
merged 17 commits into from
Feb 10, 2022

Conversation

romseygeek
Copy link
Contributor

Many consumers of the field caps API need to do some post-processing of the
results before they can use them; for instance, Kibana would like to exclude
multifields from certain field selections, or would like to display only geo_point
fields in Maps. ML and QL consumers exclude nested fields in certain
circumstances. This post-processing is possible at the moment, but can be
hacky; and in all cases it involves sending the whole (possibly very large) field
caps response over the wire and then whittling it down in the client. It is also not
guaranteed to be accurate - runtime fields may be incorrectly classified as multifields,
for example.

This commit pushes filtering into elasticsearch itself, reducing the amount of data
that needs to be transported and ensuring better accuracy. The field caps API gets
two new parameters:

  • filters - a comma-delimited list that may contain any combination of: +metadata,
    -metadata, -nested, -parent, -multifield
  • types - a comma-delimited list of field types; only fields that have a type in this set
    will be returned

The API will make best-effort attempts to apply the filters post-hoc to responses from
older nodes, so this should still work in a mixed-cluster or cross-cluster situation.

Fixes #82966, #72174

@romseygeek romseygeek added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.2.0 labels Feb 8, 2022
@romseygeek romseygeek self-assigned this Feb 8, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Feb 8, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Hi @romseygeek, I've created a changelog YAML for you.

@sethmlarson sethmlarson added the Team:Clients Meta label for clients team label Feb 8, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/clients-team (Team:Clients)

- is_true: fields.misc\\.keyword

---
"Field type filters":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test case for multiple values in the fields parameter?

@javanna javanna removed the request for review from nik9000 February 9, 2022 13:40
Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing this a while ago, then I got sidetracked, but I eventually got back to it. I left some comments that I think we should address, even if this PR has already been merged.

@@ -77,6 +77,15 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=index-ignore-unavailab
(Optional, Boolean) If `true`, unmapped fields are included in the response.
Defaults to `false`.

`filters`::
(Optional, string) Comma-separated list of filters to apply to the response. The
following filters are supported: +metadata,-metadata,-parent,-nested,-multifield
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the + notation supported only for metadata? Would it make sense to expand on what each one of these mean, with a link to the relevant docs explanation for each one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll expand the docs. The idea is that - and + work in much the same way they do in our query parsers = - means 'must not match', and + means 'must match'. So you can ask for only metadata, or to exclude metadata. For the others it doesn't make much sense to do anything except exclude them, with the exception of parent where it sort of makes sense to say 'include parent data as well as everything else', so a bare parent term is supported.

following filters are supported: +metadata,-metadata,-parent,-nested,-multifield

`types`::
(Optional, string) Comma-separated list of field types to include. Any fields that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we link the docs page with all of the field types here? Maybe also include an example at the end of the sentence?

`types`::
(Optional, string) Comma-separated list of field types to include. Any fields that
do not match one of these types will be excluded from the results. Defaults to empty,
meaning that all field types are returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question, just to double check: types:keyword means either runtime or indexed field, no distinction between the two for field_caps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keyword runtime fields return KeywordFieldMapper.CONTENT_TYPE as their family name (which is what is actually checked here). I also wonder if it's worth adding a -runtime, +runtime pair of filters. I know you don't like making it too obvious what is runtime and what isn't, but I think it can still be helpful for clients.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's wait for them to ask? :)

if ("-parent".equals(filter)) {
return false;
}
if ("parent".equals(filter)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this one documented: is it +parent or just parent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just plain parent and acts as the equivalent of a SHOULD clause - in other words, do return parent information. Clients that aren't filtering out nested or multifield entries still use this to do their own checks, but once they start using filters they can filter out parents instead. At some point in the future I envisage that we will filter out parents by default, and then clients will have to explicitly include parent if they still want that information.

@@ -336,6 +336,9 @@ public NestedLookup nestedLookup() {
}

public boolean isMultiField(String field) {
if (fieldTypeLookup.isRuntimeField(field)) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that there are existing callers of this method in FieldMapper. Was it not accurate before? I get thrown off that we look at field mappers later, and runtime fields don't have field mappers, so why do we need this additional if and what effect does it have?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used when validating copy_to fields, in which case we don't really care about runtime fields. But it is important for field cap filtering, because otherwise we might incorrectly a runtime field foo.bar as a multifield of an otherwise existing field foo.

Thinking about it though, I reckon we can do this without looking at runtime fields just by checking to see if both the passed-in field name and its parent are field mappers. Let me see what I can come up with.

instance.runtimeFields()
);
}
default -> throw new IllegalStateException("The test should only allow 7 parameters mutated");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing a serialization tests that uses multiple versions, to exercise the bwc layer.

)
);
assertThat(
request.getDescription().length(),
lessThanOrEqualTo(1024 + ("indices[" + "s9999,... (9999 in total, 9999 omitted)], fields[a]").length())
lessThanOrEqualTo(1024 + ("indices[" + "s9999,... (9999 in total, 9999 omitted)], fields[a], filters[], types[]").length())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too, I think we are missing a serialization tests that uses multiple versions, to exercise the bwc layer.

@romseygeek
Copy link
Contributor Author

Thanks for all the comments @javanna, I will open a follow-up PR to address them.

romseygeek added a commit that referenced this pull request Mar 29, 2022
* adds a test for mixed cluster requests
* fixes a bad stream version check (above test will fail if this isn't included)
* replaces private FieldCapsFilter interface with Predicate
* renames 'allowedTypes' to 'types' to maintain consistency with external API
* adds javadoc to ResponseRewriter
* removes isRuntimeField from FieldTypeLookup

Relates to #83636
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Clients Meta label for clients team Team:Search Meta label for search team v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add extra filters to field caps
5 participants